985 research outputs found

    From statistical evidence to evidence of causality

    Get PDF
    While statisticians and quantitative social scientists typically study the "effects of causes" (EoC), Lawyers and the Courts are more concerned with understanding the "causes of effects" (CoE). EoC can be addressed using experimental design and statistical analysis, but it is less clear how to incorporate statistical or epidemiological evidence into CoE reasoning, as might be required for a case at Law. Some form of counterfactual reasoning, such as the "potential outcomes" approach championed by Rubin, appears unavoidable, but this typically yields "answers" that are sensitive to arbitrary and untestable assumptions. We must therefore recognise that a CoE question simply might not have a well-determined answer. It is nevertheless possible to use statistical data to set bounds within which any answer must lie. With less than perfect data these bounds will themselves be uncertain, leading to a compounding of different kinds of uncertainty. Still further care is required in the presence of possible confounding factors. In addition, even identifying the relevant "counterfactual contrast" may be a matter of Policy as much as of Science. Defining the question is as non-trivial a task as finding a route towards an answer. This paper develops some technical elaborations of these philosophical points from a personalist Bayesian perspective, and illustrates them with a Bayesian analysis of a case study in child protection

    β models for random hypergraphs with a given degree sequence

    Get PDF
    We introduce the beta model for random hypergraphs in order to represent the occurrence of multi-way interactions among agents in a social network. This model builds upon and generalizes the well-studied beta model for random graphs, which instead only considers pairwise interactions. We provide two algorithms for fitting the model parameters, IPS (iterative proportional scaling) and fixed point algorithm, prove that both algorithms converge if maximum likelihood estimator (MLE) exists, and provide algorithmic and geometric ways of dealing the issue of MLE existence

    Differentially Private Model Selection with Penalized and Constrained Likelihood

    Full text link
    In statistical disclosure control, the goal of data analysis is twofold: The released information must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning, and statistics. It provides a rigorous and strong notion of protection for individuals' sensitive information. A fundamental question is how to incorporate differential privacy into traditional statistical inference procedures. In this paper we study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and propose two algorithms to do so. We show that our private procedures are consistent under essentially the same conditions as the corresponding non-private procedures. We also find that under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method using simulation studies and two real data examples

    Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models

    Get PDF
    Motivated by a real-life problem of sharing social network data that contain sensitive personal information, we propose a novel approach to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network while maintaining the validity of statistical results. A case study using a version of the Enron e-mail corpus dataset demonstrates the application and usefulness of the proposed techniques in solving the challenging problem of maintaining privacy \emph{and} supporting open access to network data to ensure reproducibility of existing studies and discovering new scientific insights that can be obtained by analyzing such data. We use a simple yet effective randomized response mechanism to generate synthetic networks under ϵ\epsilon-edge differential privacy, and then use likelihood based inference for missing data and Markov chain Monte Carlo techniques to fit exponential-family random graph models to the generated synthetic networks.Comment: Updated, 39 page

    Social Indicators 1973: Statistical Considerations

    Get PDF
    1 online resource (PDF, 29 pages

    Coded Parity Packet Transmission Method for Two Group Resource Allocation

    No full text
    Gap value control is investigated when the number of source and parity packets is adjusted in a concatenated coding scheme whilst keeping the overall coding rate fixed. Packet-based outer codes which are generated from bit-wise XOR combinations of the source packets are used to adjust the number of both source packets. Having the source packets, the number of parity packets, which are the bit-wise XOR combinations of the source packets can be adjusted such that the gap value, which measures the gap between the theoretical and the required signal-to-noise ratio (SNR), is controlled without changing the actual coding rate. Consequently, the required SNR reduces, yielding a lower required energy to realize the transmission data rate. Integrating this coding technique with a two-group resource allocation scheme renders efficient utilization of the total energy to further improve the data rates. With a relatively small-sized set of discrete data rates, the system throughput achieved by the proposed two-group loading scheme is observed to be approximately equal to that of the existing loading scheme, which is operated with a much larger set of discrete data rates. The gain obtained by the proposed scheme over the existing equal rate and equal energy loading scheme is approximately 5 dB. Furthermore, a successive interference cancellation scheme is also integrated with this coding technique, which can be used to decode and provide consecutive symbols for inter-symbol interference (ISI) and multiple access interference (MAI) mitigation. With this integrated scheme, the computational complexity is signi cantly reduced by eliminating matrix inversions. In the same manner, the proposed coding scheme is also incorporated into a novel fixed energy loading, which distributes packets over parallel channels, to control the gap value of the data rates although the SNR of each code channel varies from each other
    corecore